Multi-Language Image Description with Neural Sequence Models
نویسندگان
چکیده
In this paper we present an approach to multi-language image description bringing together insights from neural machine translation and neural image description. To create a description of an image for a given target language, our sequence generation models condition on feature vectors from the image, the description from the source language, and/or a multimodal vector computed over the image and a description in the source language. In image description experiments on the IAPR-TC12 dataset of images aligned with English and German sentences, we find significant and substantial improvements in BLEU4 and Meteor scores for models trained over multiple languages, compared to a monolingual baseline.
منابع مشابه
Volumetric soil moisture estimation using Sentinel 1 and 2 satellite images
Surface soil moisture is an important variable that plays a crucial role in the management of water and soil resources. Estimating this parameter is one of the important applications of remote sensing. One of the remote sensing techniques for precise estimation of this parameter is data-driven models. In this study, volumetric soil moisture content was estimated using data-driven models, suppor...
متن کاملMultimodal Attention for Neural Machine Translation
The attention mechanism is an important part of the neural machine translation (NMT) where it was reported to produce richer source representation compared to fixed-length encoding sequence-to-sequence models. Recently, the effectiveness of attention has also been explored in the context of image captioning. In this work, we assess the feasibility of a multimodal attention mechanism that simult...
متن کاملA Multi-scale Multiple Instance Video Description Network
Generating natural language descriptions for in-thewild videos is a challenging task. Most state-of-the-art methods for solving this problem borrow existing deep convolutional neural network (CNN) architectures (Alexnet, Googlenet) to extract a visual representation of the input video. However, these deep CNN architectures are designed for single-label centered-positioned object classification....
متن کاملImage Captioning with Object Detection and Localization
Automatically generating a natural language description of an image is a task close to the heart of image understanding. In this paper, we present a multi-model neural network method closely related to the human visual system that automatically learns to describe the content of images. Our model consists of two sub-models: an object detection and localization model, which extract the informatio...
متن کاملSubhashini VenugopalanProposal
For most people, watching a brief video and describing what happened (in words) is an easy task. For machines, extracting the meaning from video pixels and generating a sentence description is a very complex problem. The goal of my research is to develop models that can automatically generate natural language (NL) descriptions for events in videos. As a first step, this proposal presents deep r...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1510.04709 شماره
صفحات -
تاریخ انتشار 2015